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Introduction 



One of the old adages in education is that tlie single best way to 
evaluate teachers is to inspect the tests they give their students. Good 
teaching and good testing go hand in hand. That is so because well- 
constructed tests constitute a very important means by which teachers 
motivate and direct student learning, determine how well students arc 
achieving instructional objectives, and assess how well they are 
teaching. 

There are many reasons why teacher-made tests may be inadequate. 
While teachers may have had experience constructing a few test items 
as part of a course \n tests and measurement, few have been asked 
to develop a complete test (using a test blueprint), give the test to 
students, and change it in accordance with an analysis of student 
responses. Moreover, some teachers put off test construction to the 
last minute, constructing a test in haste and without reference to in- 
structional objectives will surely fail to motivate and direct student 
learning. 

There are many other common mistakes made by teachers when 
constructmg tests. Some ask too many low-level questions. Ohers 
rely on test items supplied by a textbook publisher, unaware that these 
items may not be good or appropriate (content valid) items for what 
is going on in their classroom. 

Many teacher-made test questions are of poor quality because they 
are technically inadequate. For example, if an item has two or more 



correct answers that depend on different but legitimate interpretations 
of the item, then the test suffers from item ambiguity. 

In the case of irue-false tests, a distinct tendency to ask more true 
than false questions will provide an unintended clue to correct an- 
swers. Also, there is evidence that false items discriminate among 
high and low achieving students somewhat better than true items. Thus 
a teacher might be belter off asking more false than true questions. 

Finally, many teachers at ail levels of education do not analyze stu- 
dent responses to test items in order to check on the quality of their 
tests. There are several relatively simple analytic techniques availa- 
ble that can help teachers improve their tests. For example, simply 
counting the responses that high-achieving and low~achieving students 
make to each objective item on a test can reveal how well an item 
discriminates between these two groups as well as suggest what might 
c>c uonc to improve the item. 

Although the foregoing problems are far from rare, the good news 
IS that much is known about constructing effective classroom tests. 
Any teacher can use this k.iOwledge to improve his or her skills in 
measurement and evaluation. 

This fastback is about writing good test items, a task faced by most 
teachers at all levels of education. Although my main concern is with 
the construction of test items, mere technical facility in writing items 
is of little value if they do not clearly reflect the objectives of instkiic- 
tion I am assuming that teachers a^e thoroughly familiar with course 
content and are skilled in written expression. Nothing said about test 
construction will substitute for competent teaching. Thus this fast- 
back is designed to help good teachers write better items than they 
might otherwise write. 

This introductory chapter touches on some of the problems associ- 
ated with preparing classroom tests. The second chapter outlines five 
steps in preparing a test, the PLAN-WRITE System. The third chap- 
ter describes the do's and don'ts of writing commcr^y used test items. 
The last chapter will provide practice in applying these suggestions. 



This fiEistback concentrates on but one aspect of the complex pro- ^ 
cess of measurement and evaluation: writing technically correct test 
items that arc compatible with your instructional objectives. You must 
decide how far you want to extend your knowledge and skills beyond 
the rudiments presented in these pages. Should you decide to con- 
tinue, the resources at the end of this fastback should be of help. But 
whatever you decide, you will fmd many useful suggestions that can 
lead to wnting better test items. 
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Preparing Teacher-Made Tests 

Y^ou can prepare good classroom tests by following five basic steps. 
We shall refer to these steps as the PLAN-WRITE Sysiem. The steps 
are: 

1. Prepare a content outline. 

2. List instmctional objectives. 

3. Appraise student performance levels. 

4. Note content, objectives, and levels in a test blueprint. 

5. Write test items. 

Ideally, the PLAN- WRITE system should be implemented before 
mstruction, Ji ing so can help you plan instruction. And if you share 
the test blueprint (Step 4) with your students, as some specialists 
recommend, you can help them prepare for your tests. 

L^t us now examine each step in turn. 

Step One: Prepare Content Outline 

Review >our notes, p.ssignments, and textbook materials, then out- 
line the areas of content that your test items will cover. For example, 
for d unit on Amencan government, a partial outline might look like 
this: 



Content Outline 



Content Emphasis 

Vocabulary 20% 
Federalism 
Popular Sovereignty 
Pocket Veto 
Judicial Review 



Facts/Principles 60% 
Three branches of government 
How a bill becomes a law 
Special powers of the House 
Amending the Constitution 



Applications 20% 
Interviewmg government officials 
Reporting on a town rneetirK; 

Although categories will differ, it is im^rtant o note tv<o things 
about good content outlines. First, content should «lv match what 
>ou do and what >ou require your students to do during instruction. 
It IS neither fair nor good practice to base test items on material that 
has been ignored or neglected during instruction. This is a content 
validity issue. Second, yoti will need to estimate in percentage terms 
the amount cf emphasis each part of the outline received during in 
struLtion. This percentage appears m the right column of the outline. 
These estimates will be the basis fordeterminui^^ \^hat proportion of 
your test will be devoted to each part oi' the outline. For example, 
assuming that bO items have been prepared, about 10 items should 
be concerned with vocabulary (20% of 50), 30 should cover 
facts prmLiples. «;nd 10 should neasure student learning in the area 
of applications. 

II 



Step Two: List Instructional Objectives 

Good test preparation requires good instructional objectives, which 
can be assessed in terms of observable behavior or performance. A 
behavior or performance objective is a statement about what students 
will be able to do or perform after instruction. Behavioral objectives 
identifV behavior, outcomes, or products th?t arc directly observa- 
ble For example, we can directly observe a person playing the vio* 
Im, talk.ng, writing, and so on. Similarly, we can directly observe 
outcomes or products a misspelled word, a complete or incomplete 
worksheet, a science project, an essay. Good behavioral statements 
make use of action verbs, such as listing, pronouncing, reciting, select- 
mg, and solving. 

Good mstructional objectives serve three main functions. 1) they 
help teachers focus on important learning experiences, 2) they com- 
•nunicale expectations to student** .ind others, and 3) ihey suggest ways 
tv* cv aludte learning Asa dcmoastration of the usefulness of behav loral 
objectives in preparing a test, consider »he following objectives. 

Ohjeane I To understand the differences among the three 

branches ot government. 
Objci n\e 2 To identify correct and mcorrect examples of cxccu- 

tive, legislative^ and judicial action. 

Objeciive 2 more clearlv define*' >our rask (you need to provide 
example^), it lets your htudcrts know what you expe*"' (uicy will have 
(o ^ho< sc from «unong examples), and it suggests the kind of test items 
>uu r.iighi wri:c (perhaps multiple-Lhoice items, wnich present new 
examples), 

When v-riting instructional objectives, avoid words and pnrascs like 
appreciaces comprehends, knows, learns, understands, reads with 
ease, e(c These and many other words and phrases do not refer tc 
directly observable behavior We cannot see "thinking," *\inderstand- 
ing/' or a student's "interest The existence or nonexistence of these 
♦•lings can only be inferred from observable perfunnancc or products. 



Good behavioral objectives must include a standard or criterion of 
performance in addition to the specification of directly observable be- 
havior. Many writers also recoinmend a third component: the condi- 
tions under which performance occurs. An cample of an acceptable 
objective would be, "Students will point out the puns in three un- 
famil'ar passages with at least 90% accuracy.** 

Stating good instructional or performance objectives can serve im- 
portant teaching, learning, and evaluative functions, as we have seen. 
But there is no guarantee that they will do so. Writing technically 
correct instructional objectives is one thing; writing important or 
worthwhile ones is another. 

Fortunately, there are several twangs teachers can do to guard against 
writing educationally unimportant objectives. That is the subject of 
the third step in PLAN-WRITE. 

Step Three: Appraise Performance Levels 

The third step in preparing a test is inseparable from Step 2. As 
you list your instructional objectives, you mt , include objectives that 
call for different levels of cognition or thinLng. Benjamin Bloom and 
his colleagues have developed a taxonomy of instructional objectives 
that uses a hierarchy or levels of thinking. The six levels can serve 
as a basis for developing test items that assess different types of in- 
structional objectives. The six levels are: 

• Knowledge 

• Comprehension 

• Application 

• Analysis 

• Synthesia 

• Evaluation 

Knowledge is the simplest level of cognitive performance and evalu- 
ation is the highest and most complex. The basic idea behind the 
hierarchy is that higher level performances include and arc dependent 
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on lower-level ones. We may think of the first four levels (knowl- 
edge, comprehension, application, analysis) as relating generally to 
understanding concepts and principles, while the last two (^^ynthesis 
and evaluation) relate to more creative endeavors. The followmg dis- 
cussion on each of the six levels will clarify their relevance to 
instructional objectives and test item construction. 

Knowledge An objective written at this level requires students to 
reproduce sometl»ing in ,nore or less the same form as it was presented, 
for example, asking students to produce a list of memorized words, 
to repeat the solution to a specific problem, or to state facts verba- 
tim Action verbs used in objectives \\rittenat this level include dc 
fine, identify , label, name, recall, recite, recognize, select, and state. 
Knowledge- level objectives are relatively easy to wrii! and are very 
prevalent in education, in fact, the> usuail> overshadow higher-level 
objectives. 

Comprehension. At this level, students must not merely reprod'ice 
something, they rr. jst understand it to the point of changing it in some 
\vay For example, when you ask your students to summarize or para- 
phrase what they have heard or read, to give their own example of 
something, to translate from one language to another, or to read mu- 
sic, you are calling for responses at the comprehension level. Action 
verbs associated with this level include explain, convert, generalize, 
interpret, and predict. 

Application Objectives that require students to u.^e a principle, rule, 
generalization, or strategy in an unfamiliar settin , qualify as applica- 
tion objectives These objectives go beyonc jmprehension in that 
they require students to use ideas, principles, and theories, not mere- 
ly to paraphrase or explain them. If, for e^xample, you asked students 
to collect and classify insects found in their neighborhoods after teach- 
ing the principles o*" classification, you would be calling for behavior 
at the application level If the students collect and classify only those 
insects already studied in class, their performance represents com 
prehension, not application Action verbs associated with application 
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objectives include choose, compute, demonstrate, employ, implement, 
produce, relate, and solve. 

Analysis. An analysis objective calls for student^ to break down 
aomething unfamiliar into its basic parts. It also may require a focus 
on the relations among the parts. Analysis depends on skills from lower 
performance levels, knowledge (knowing what to look for), compre- 
hension (translating a concept or principle), and application (relating 
translated knowledge to the problem at hand). Words and phrases as- 
sociated with this level include deduce cause and effect, diagram, dis- 
tmguish, infer mood or purpose, note unstated assumptions, outline, 
select relevant particulars, and subdivide. 

Synthesis. Objectives written at ihis level require students to produce 
something original or unique. Unlike previous levels, there usually 
IS no one best or right answer (although judgments are made about 
the quality of performance). Synthesis objectives call on students to 
respond to unfamiliar problems by putting things together or com- 
bining elements in onginal ways. When you ask your students to write 
an onginal essay , poem, story, or musical composition, you are calling 
on them to be creative. You also are helping them become mature 
learners. Words associated with synthesis objectives include catego- 
rize, devise, discover, formulate, and invent. 

Exalmtion. At this level students are req tired to judge the quality 
or value of an idea, method, product, or human performance that has 
d specified purpose and to include reasons for their judgment. A''k- 
mg students to judge the quality of a play, to make up titles lor a 
play, or to create a new play and to provide a rationale for their 
responses represents performance at this level. As in the case of syn- 
thesis [here is usually no one best or correct answer. Thus objective 
test iten.s may fail to adequately measure complex objectives written 
at the levels of synthesis and evaluation. Verbs commonly associated 
with this category include appraise, assess, compare, criticize, and 
justify. 
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After preparing a content outline, listing instructional objectives, 
and appraising levels of performance, you are ready to combine all 
three in a test blueprint. That is the fourth step in PLAN-WIJTE. 

Step Four: The Test Blueprint 

Test blueprints can help teachers avoid tlic test preparation prob- 
lems mentioned earlier. Of course, describing cteps in the preparv 
tion of a test is much easier than applying, diem. Outliaing course 
content and writing objectives that reflect ufi^rent levels of perform- 
ance is hard work. But the payoff is a toment-valid test that requires 
students to think at all cognitive levels. 

An example of a test blueprint is shown m Figure 1. It is sketchy, 
but it does illustrate the ba^ic features of good test construction. Listed 
to the left are a few illustrative objectives that reflect course content. 
To the iight are the six performance levels. The numbers tell us two 
things, how many items will be written for each objective and the 
performance level of the items. Thus, in the case of the first objec- 
tive listed, the teacher will write eight items *o test student vocabu- 
lary, four of these will test at the knowledge level and four at the 
level of comprehension. Item percentages reflect the teacher's esti- 
mate of the emphasis receive ')y each major part of the content out- 
line. Fin-'lly, as you can see, the test will have a total of 5C items. 

When you haw^ built a good test blueprint, you are well on your 
way to preparing a content-valid test. And that takes us to the last 
step in PL AN- WRITE. 

Step Five: Write Test Items 

The rest of this fastbacL is devoted to writmg good test items. Al- 
though teachers use a variety of ways to observe student behavior 
(work samples, <-hecldists, rating scales, self-report devices), we shall 
focus on test itf ms, especially those that teachers use most often - 
true-false, mi'lliple-choice, matching, completion, and essay !tems. 
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Perfcrmance Levels 



Q- C > O <1> 

< < W liJ ^- Q_ 



Number of Items 



8 16%! 
2 4% ; 



3 6"'o 

1 2^0 I 

2 40-G I 
? 40o j 

16 320-0 ' 



S 10% ; 
S 10% 
50 ' 



The first four (true-felse, mulHple-choice, matching, and comple- 
tion) are considered objective items, in that those who score them 
end up with the same score (hairing clerical errors). Essays are con- 
sidered subjective items in that those who score them often end up 
with different scores. Sometimes true-felse, multiple-choice, and 
matching icems are referred to as f7xed-response items (students must 
choose from the options offered), while completion and essay items 
are referred to as free-response or supply-type items (students sup- 
ply their own answers). Since too much freedom of response may 
lead to variability in scoring, care must be taken in writing comple- 
tion items. 

It is important to point out that test construction is more art than 
science. This is especially true of teacher-made tests. Thus many of 
the suggestions for preparing a test are more a matter of expert judg- 
ment than of scientific verification. There is some evidence that 
well-prepared teacher-made tests are as reliable (consistent in meas- 
urement) as many standardizea tests and even more valid for a 
particular student or class. 

The next chapter will provide some do's and don'ts for writing com- 
monly used types of test items. 
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Writing Test Items 



A his section begins with some general guidelines that apply to writ- 
ing test items. Later, suggestions specific to each type cf item will 
be offered. No guideline is correct for all situations, so use a little 
common sense if a guideline seems inappropriate. 

I . Make certain that the performance called for in your test item 
closely matches that specified in your objective. This is the most im- 
portant guideline of them all. For example, one of the objectives from 
the test blueprint presented in Figure I was, '^Students will calculate 
force for objects of different mass and acceleration.** A multiple-choice 
test Item asking, "Which of the following formulas is used to deter- 
mine force?** clearly would not meet thai objective. 

2 Eliminate trma. Each test item should measure an important as- 
pect of the subject matter. Deciding whether an item is trivial requires 
a judgment call Answering these questions might help make this judg- 
ment Does the item test soa.ething worth knowing? Will responding 
Lorrectl) make a difference m the competency level of my students? 
If >ou answer "no** or "maybe not,** you have probably written an unim- 
portant item. 

3 Stay away from textbook language. A slavish reliance on the text 
will almost certainly lead to low-level items. 

4. Strive to write objectives and test items that call for higher-le\el 
perfornuinL€S. This guideline is a ::orollary to the previous one. Com- 
pare the Items below. Both deal with the interference effects of our 
actions on memory; but only one follows the guideline. 
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1) Forgetting something you learned yoslerdav because you learned 
something else today Is an instance of: 

a. interference 

b. proaction 

c. decoding 

d. disassociation 

2) If you wanted to make practical use of the research on forgetting, 
what v/ould you recommend to a friend who has but two hours to pre- 
pare for next week's 8:00 a.m. exam? 

a. Study a week in advance 

b. Prepare the night before 

c Study two consecutive hours 
d. Prepare at least one day before 

The first question is probably measunng only memory. The sec- 
ond is at least an application item, if we assL ne that the problem is 
a new one for the student. With a little thought, many lower-level 
items can be converted to higher-level ones. Incidentally, the ability 
to formulate higher-level items is indicative of one s command of the 
subject matter. 

5 Write unambiguous questions. Your questions must have un- 
mistakable meanings and not be open to more than one equally 
reasui*^' le interpretation. N^te the ambiguity in the following item. 

1) Governments are very dissimilar in the way they operate. (True or 
False) 

Unless the teacher or text actually made the statement, the item is un- 
answerable because it is ambiguous. Who is to say what **very dissim- 
ilar" and "in the way they operate" mean? And if the teacher or text 
did make the st^itement, the item is a good candidate for a tnvia award. 

6 Make certain that objective itenv* have answers that are clearly 
correct - answers that experts could accept. Asking colleagues to 
respond to your items car help. Asking competent students for feed- 
back also can help. 



7. Arrange items in an order that differs from that used during in- 
struction. If this is not done, stuHents may profit from **order clues" 
when answering questions. A usetul technique is to begin your test 
with a few relatively easy items. Then choose items in random order 
so they do not follow the sequence of the text or your presentations. 

8. Watch for items that cue answers to other items. Do you re^'^l 
getting this kind of help on exams? 

9. Make items independent of one another. An answer to one item 
should not depend on knowing the answer to some other item. 

10. Keep the reading level simple. Constructing items using tech- 
nical words that are related to the subject being tested is acceptable 
practice; making the reading level of items difficult by using unfamiliar 
or uncommon words is not (unless, of course, one's goal is to assess 
vocabulary or reading skill). 

11 Mte the source of an opinion. It is unfair to ask students 
to c .idorse som .;thing that is largely a matter of opinion. The second 
trt false item oelow is better than the first because it follows this 
guidehne. 

1) Martin Luther King acted on the basis of postconventional moral 
principles, (True or False) 

2) According to the authors, Martin Luther King acted on the basis 
of postconventional moral principles. (True or False) 

12. Try preparing your questions long before using them. Follow- 
ing this suggestion will promote thoughtful questions. An added bo> 
nus is having time to review and revise items in need of repair. 

So much for general guidelines. We now turn to specific guide- 
lines for writing objective and essay questions. At times, one or more 
of the above guidelines ma> be repeated as they relate- to specific types 
of items. 




True-False Test Items 



True-false items have been criticized by both lay and professional 
people It is said, for example, that true-false tests are ambiguous, 
measure trivia, promote rote learning, and are prone to error because 
th^y encourage guessing. And, of course, poorly written true-false 
items suffer from all of these problems. But tliere is not much re- 
search eviuc e that these are inherent problems. Consider, for ex- 
ample, the follow mg items \i>suming that the questions posed are 
new t* student, are tliey ambiguous? Trivial? Oo they measure 
r( »nmg? 

1 . A man firmly believes tnat murder is a crime because it is v/rong. 

This belief '"uslrates stage 4 in mora! aevolopmenl. 
2 If a perfect inverse relationship existed be^/eon complaining and 

having friends, the more one complained the fewer fror.ds one 

would have. 

The first item was designed to test student understanding of 
Lawrence Kohlberg*s moral stages of development. The second i: .o: 
cemed with the interpretation of the correlation coefficient. The pom: 
IS that true- false items skillfully written can be unambiguous, call for 
worth bile learning, and need not measure only role learning. 

Blind guessing is a potential problem with true-false items. But since 
there is some evidence that few students resort to blind guessing Cm- 
formed guessing" is more likely), the problem may have been exag-^ 
gerated To combat the problem of guessing, students are sometimes 
asked to correct false statements by indicating why they are false. 
In such cases the scoring of the items may no longer remain objec- 
tive Usmg formulas to "correct** for guessing on teacher-made tests 
is not iCLommended by testing experts, unless many students do not 
have time to complete tests. 

True-false items are sometimes thought to be easy to prepare, but 
this advantage is more apparent than real. It takes hard work and skill 
to write good items But if good items are written, two advantages 
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follow: 1) they provide a simple and direct way to measure student 
achievement, and 2) they are efficient - they provide many indepen- 
dent, scorable responses per unit of test time. True-false items can 
sar pie much of what a student ha:» learned and are especially useful 
when asking questions about stimulus materials such as films, maps, 
diagrams, and graphs. 



True-False Item Guidelines 

1 . Base the item on a single idea or preposition. Single-idea ques- 
tions are relatively easy to understaad. They are also more likely to 
be better questions than those that are longer and more complex. Thus 
Item A below is acceptable; but item B, because it expresses two ideas, 
is not. 

A) Measurement and evaluation are synonymous terms. 

B) Measurement refers to assigning numbers systematically, while 
evaluation refers to making judgments about the meaning of as- 
signed numbers. 

2. Write items that test an important point. 

3. Avoid lifting, statements directly from the class text. Some 
tc^chers, for example, merely insert "not** in a statement taken from 
the text. Unless memorizing exact content is essential, this approach 
will seldom lead to ^^ood results. 

4. Be concise. Your questions should be as brief as possible. 

5. Write clearly true or clearly false items. It is not necessary to 
write perfectly true or perfectly false items so long as the answers 
are defensible - that is, so long as well-informed people would agree 
with keyed answers. Writing items in pairs can help you do this. One 
Item should be true, the other false. You will, of course, choose only 
one item from a pair for the test. 

6. Eliminate giveaways Giveaways are unintended clues to cor- 
rect answers. Let us take a look at three of these. 
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a) Tnie statements end to be longer than false ones, due perhaps 
to the addition of qualifying words to make an item true. Item 
length may then cue correct answers. Make your true and false 
items approximately equal in length. 

b) Some teachers have a tendency to write many more true than 
false questions or many more false than true questions. When 
students catch on to this, they benefit from a second kind of 
giveaway. Thus approximately half your items should be true 
and half false. However, because there is evidence that false 
statements discriminate between high and low achieving stu- 
dents belter than true ones do (people tend to answer 'Irue" when 
in doubt), you may want to include a few more false items. 

c) The use of specific determiners (extreme words and certain 
modifiers) can cause trouble. Because completely true state- 
ments are rare, strongly worded sUements are likely to be false. 
Thus try to avoid such words as all, always, never, only, noth- 
ing, and alone. Many students know that these words arc like- 
ly to be found in false statements. Words or phrases like may, 
could, as a rule, sometimes, often, in general, occasionally, 
and usually should be avoided also because they are associated 
with true items If you must on occasion use a specific deter- 
miner, use it in a way contrary to expectation. 

Some readers at this point may think thct these guidelines and those 
that follow are designed to trap or trick studenf.s This is not the case. 
A test that allowed a student to get 5 of 25 questions right because 
ofitem length or specific determiners would not be measuring knowl- 
edge The more one allows this kind of student response, the less valid 
one's test is likely to be. 

7 Beware of words denoting indefinite degree. The use of wordi, 
like more, less, important, unimportant, large, small, recent, old, tall, 
great, and so on, can easily lead to ambiguity. 

8 State items positively Negative statements may be difficult lo 
interpret This is especially true of statements usmg a double nega- 
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tivc (which should not be used in any type of test item). In addition, 
even well-prepared students can *^ily overlook such words as not 
or never. If on rare occasions a negative word is u^cd, be sure to 
underline or capitalize it. 

9. Beware of detectable answer patterns. Test wise students can 
detect answer patterns ITTTFFFF) designed to make scoring easier. 

Multiple-Choice Test Items 

Test specialists regard ihe multiple-choice item highly, thus their 
\videspread use on standardized achievement and aptitude tests How- 
ever, critica of multiple-cho«ce items complain about poorly written 
Items. 

One of the cntics' complaints is that multiple -choice questions meas- 
ure liu,c mere than vocabulary, isolated facts, and trivia, and very 
often the> are right. Bui killed hands, multiple-choice items can 
be impressive!) adaptable in measuring man> important educational 
outcomes. 

The basic components of multiple-choice quCMions are the item 
stem, which presents a problem, and several response options (usually 
3 to 5), which follow under the stem. One of the options is correct 
or clearly the best answer to the problem. The other options (distrac 
tors) are designed to bo attractive to the uninformed. 

The "interpretive exercise" format is widely used in standardized 
testing because it provides a potentially excellent means to test higher 
levels of achievement or aptitude. This Lrmat presents the testee with 
d paragraph, graph, picture, poetry, work of art, musical composi- 
tion, or any other complex material. The student is then asked a se 
rics of questions that require interpretation, application, analysis, 
synthesis, or evaluative thinking. If you have r done so, try your 
hand at using this format. It is capable of tapping many forms of high 
level performance. 




MulUpie-Cbokre Item Guideli^ies 

Select the fonnat of the item with care. Although there arc several 
natioiis uf the multiple-choice format, those that follow can serve 
you well. 

z.) To make rejKling easy, response options are listed vertically 
rather than arranged in uindem. 

b) Response options follow logically and grammatically from the 
iter) stem. 

c) The ^tem presents a complete problem (more about this later). 

d) No punctuation marks are used when options contain numbers 
(they might be misread as decimals). 

e) All items need not have the same number of response options. 

State a clearly formulated problem in die stem. The examinee 
should not have to complete the problem by consulting the response 
options The stem may be either a complete question or an incom- 
plete sentence, so long as a specific problem is formulated. It is prob- 
ably a good idea to formulate a complete question whenever yoL ^an. 
Consider this item' 

Mars is 

a. closer to the sun than Jupiter 

b. 93,000,000 miles from the sun 

c. the third closest planet to the sun 

Because the stem fails to present a complete problem, u'iC item is 
functioning as a true- false item. Its stem is much like these item stems. 
"Which of the following is true?" and "Select the false statement from 
the following ' "Multiple-choice" questions of this sort should prob- 
ably be cast in a true-false format. 

3 State iteni sutms positively. Negatively stated stems not only mav 
lead to confusion, they may faU to reflect the kmds of problems stu- 
dents experic'ice in everyday life. 

ERIC 



If on rare occasions a negative word is used (not, never), be sure 
to underline or capitalize it. The advice with respect to double nega- 
tives - a negative in the stem and in one or more of the response 
options - is even stronger: Eliminate ther -ntirely. 

4. Write the stem so that the answer is placed at the end. Thus the 
first stem below is preferable to the second. 

a) The term for the chemical activities of all living things is 

b) refers to the chemical activities of all living things. 

5. Be creative. This guideline challenges you to take advantage of 
the versatility of the multiple-choice item for measuiing important 
educational outcomes. Strive for this arbitrary goal: At least half of 
the items will be above the knowledge level. 

6. Be concise. Being concise not only promotes clarity of expres- 
sion, but saves valuable testing time as well. 

7. Make all distractors plausible. There should be a degrte of truth 
in each distractor in order for the uninformed to find then attractive. 
If it is unlikely that anyone would choose an option, why include it 
at all? On occasion, even two options would be preferable to offer- 
ing implausible distractors. 

8. Make certain there is only one clearly best answer. 

9. Avoid using the options "all of these" and *none of these." These 
options tend to be overused by those who have difficulty formulating 
plausible distractors. They also are associated with item ambiguity. 
Neither should be used unless the answer to an item is absolutely 
correct. 

10. Elimin unnecessary repetition. If a phrase is repeated in each 
response item, add it to the stem. This eliminates unnecessary words. 
The more students engage in unnecessary reading, the fewer ques- 
tions they can respond w per unit of test time. 

1 1 Eliminate giveaways. Let us consider five kinds of giveaways 
in multiple-choice items. 
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a) Test-wise students can detect any tendency to niake correct an- 
swers longer than distractors. The remedy is to make your re- 
sponse options approximately equal in length. 

b) Test-wise students also can spot specific determiners when they 
are used in item stems and response options. Watch for them. 

c) Using the same or similar words in both the item stem and the 
correct answer can give away the answer. 

d) Beware of grammatical giveaways. For example, if the stem 
ends with the word "an" and only one or two options begin with 
a vowel» then the student can easily eliminate distractors. Simi- 
larly, if the stem has a singular verb and one or more of the 
options z plural » students are given an extra clue to the cor- 
rect answer. 

e) Alert students may detect any tendency to prefer certain re- 
sponse options. For example students may learn that option 
'*c'' is most often correct or thit option "a" is seldom correct. 

12 Order response options. Arrange response options in some log- 
ical sequence, if possible. This will help students locate choices. 
Names could be ordered alphabeticall>, dates chronologically, for- 
mulas in terms of complexity, and so on. 

Matching-Test Items 

A matching-test item format typically consists of two c lumns: a 
premise set on the left and a response set on the right. Students are 
asked to match items in the two columns. Furth. there are a variety 
of matching formats that differ in complexity ..om the simple two- 
column formac For example, matching items can call for students 
to select an author and a concept for each descriptive statement listed. 
Some of these formats can be quite demanding, requiring perfor- 
mances well above the memory level. 

Matching items ar: especially effective in prompting students to 
see relationships an.ong a set of items and to integrate knowledge. 



However, they are less suited than multiple-choice items for measur- 
ing higher levels of performance. 

Matching-Test Item Guidelines 

1. Provide directions. Students should not have to ask, for exam- 
ple, whether options may be used more than once. 

2. Use only homogeneous material. Each item in a set should be 
the same kind as the other items, for example, all authors or all cities. 
When different kinds of items are used in each set, the associations 
tend to be obvious. 

3. Place longer material in the left column. This will help students 
locate matches. 

4. Arrange column material in some systematic '^rder. For exam- 
ple, names can be arranged alphabetically, and so on. 

5. Keep columns short. As a rule of thumb, the premise set should 
contain no more than 3 to 7 items, the response set should contain 
2 items more than the premise set. 

6. Keep an item on one page. Arrange items so that students will 
not have to turn pages back and forth as they respond. Placing one 
item on two pages can become quite frust'-ating to students. 



Completion-Test Items 

Completion item** ao. studenti* to supply an important word, num- 
ber, or phrase to complete a statement. Blanks are provided to be 
filled in by the student. 

Completion items are especially useful in the early elementary 
grades where vocabulary is growing in basic subjects. They also are 
especially effective in mathematics and science when answers to prob- 
lems require computation. Completion items are easy to write and 
provide efficient measurement. And they are not as susceptible to 
guessing as are true-false and multiple-choice items. 
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But there are drawbacks. Questions can easily be turned into sub- 
jective items, in which responses would vary so greatly that subjec- 
tivity would enter into scoring. Another drawback is that completion 
items are more suited to measuring lower-level than highei-level 
performances, except when problems are presented, as in the case 
of mathematics and science. 



Comphtion-Test Item Guidelines 

1 Call for an ;wers that can be scored objectively. Prefer single 
words and short phrases. Check your items by posing this critical 
question: Can someone with no competency in the subject score the 
items objectively by relying solely on the answer key? 

2 Prepare a sconng key that contains all acceptable answers for 
each item, 

3 Beware of open questions. Open questions are those that invite 
unexpected but reasonable answers, as in the following case. 

The author of Profiles in Courage was (John F. Kennedy ). 

But what is one to do with tlicse answers. "President," "assassinated,*" 
**privileged"^ The remedy is to close the item so there is but one an- 
swer to be scored objectively. For example, ask for "the name of the 
author.*" 

Remember to key all acceptable answers. Thus a key for this item 
might contain John F. Kennedy, John Kennedy, J.F. Kennedy, and 
J Kennedy. Some teachers might not accept just Kennedy. 

4 Place blanks near the end of the statement. Try to present « com- 
plete or nearly complete statement before calling for a response, 

5. Eliminate giveaways. Two giveaways arc: 

a) The length of the lines to the right of an item can provide :Iuei> 
(o the con-ect answer. The remedy is to make all blanks of equal 
length. 
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b) Grammatical structure also can provide clues. For example, 
using the word "an** can alert the student to the fact that the 
answer begins with a vowel. 

6* Limit the number of blanks to one or two per item, if possible. 
Statements with too many blanks waste valuable time as studentj> at- 
tempt to figure out what is being asked, 

7. If a n\ "^'jrical answer is called for, indicate the units in which 
it is to be expressed. 

Items that call for more than a word or phrase are sometimes classi- 
fied as restricted-ess^ ^ .tems. It may be preferable to call them wrltten- 
response items The term "essay" can be reserved for items sharing 
the characteristics identified in the next section, to which we now turn. 

Essay Questions 

Essay items ask students to suppl> written answers to questions 
(sometimes questions „nd answers are oral). Judgments are then made 
about the accuracy and quality of their answers. Though the lines of 
demarcation are not c*ear, »t is convenient to place items requiring 
written responses in one of three classes: 

1 . Written- response items. These cms call for answers that fall 
in the knowledge or comprehension cat ^'ories of performance. 
Responses to written-response items may consist of a fact or opinion 
or as much as a student can rer;ember about a specified topic. Thus 
answers may range in length from a sentence or two to many pages. 
Written- response items are very common in education* Although they 
can and do serve important teaching, learning, and evaluative func- 
tions, they are sometimes perjoratively referred to as "regurgitation 
items.** Here is an example of a written-response item: 

A'hat three functions do instructional objectives serve? 

2. Restricted- response essays. We shall here reserve the term es- 
say for questions that present an unfamiliar problem to the student. 




The student, in turn, recalls relevant concept", facts, and principles; 
organizes these recollections, and writes a coherent and creative re- 
sponse to the problem. Restricted-iesponse essays call for responses 
that are roughly a page or less in length. The> may be distinguished 
from written-response items m that answers fall above the knowledge- 
comprehension levels of performance. Here is an example of a 
restricted- response essay (assum that the problem is unfamiliar to 
the studenis): 

Compaie verbal and nonverbal art. How are they similar? Different? 

3. Extended response essays. These differ from restricted-response 
essays in that the questions po^cd are more complex, hence they re- 
quire extended answers. Extended-response essays call on a range 
of talents, from knowledge to evaluation. A competent evaluation of 
the answers also requires talenti Here is an example of an ext< ided- 
response essay: 

Imagine that you visit Earth in 25CX) A.D. You find astonishing 
changes in education Stuoants spend nrwre time in school, study more 
things, and almost never drop out Educators no longer talk about in- 
dividualizing instruction — they practice it completely. Every student 
is presented with different learning materials and activities. In fact, 
students a e similar academically in only one way. Each is develop- 
ing knowledge and skills in reading wnting, speaking, languages, 
mathematics, scient'fic method, problem solving, technology, and 
creativity You observe that students are not given tests as you knew 
thorn, yet they are learning several times more in half the time needed 
in 2O00 A.D. They spend haif their time in artistic, literary, musical, 
physical, and community activities. Highly paid teachers work in teams. 
No one lectures Some focus on analyzing knowied:5e sets, others work 
on instructional programming that hardly resembles that which you 
knew, still others provide counseling for individual students. Centu- 
ries of use have 'aught teachers that technology is the only means 
to deliver instruction fairly and effectively. You conclude that educa- 



tion in 2500 A.D. is characterized by three things: a highly sophisti- 
cated concern with what is presented to a student, a delivery system 
lhat allows complete student involvement, and a technology that pro- 
vides immediate feedback for student action. 

Write a well-organized essay of 400 to 600 words discussing changes 
in leaming theoiy, in views toward students, and in the politics of edu- 
cation that might account for the educational system of 2500 A.D. You 
wii; be judged not only by what you say, but how effectively you use 
references. E?evote approximately 30 minutes to planning, 80 to writ- 
ing, and 10 for proofreading and revision. 

Many educators are quite generous in their praise of essay items. 
It has long been said that essays provide the best overall means of 
assessing achievement. Bu^ this claim is not well established. In fact, 
we may eventually discover that essay questions have little distinct 
ad\antage over objective questions in measuring academic achieve- 
ment (except, perhaps, in the areas of synthesis and evaluation). This 
IS not to say that ecsay questions have no advantages over objective 
questions, ll is to say that their advantages may lie elsewhere. 

The most important aspect of essay questions is that they provide 
relative freedom of response. Whenever you want your studen.s to 
select from an array of information, organize what they select, and 
express themselves in writing with a minimum of constraint, the es- 
say question is unequaled. That is their main advantage And surely 
the ability to write a quality essay about a subject of interest is a very 
important educational outcome. 

But there are some serious drawbacks. First, the scoring of essays 
IS inconsistent (unreliable). Studies have shown that teachers will in- 
dependently assign a range of scores to the same essay paper Fur- 
ther, any one teacher may well change his assessments of a set of 
essay papers from one scoring session to another. Second, because 
only a relatively small number of questions can be asked in a typical 
testing session, essa. tests tap a smaller sample of student achieve- 
ment than do objectiv . tests (this also weakens reliability). And third. 



because the item-writer is seldom called on {o reveal her scoring 
methods, item and scoring deficiencies are less open to scrutiny than 
are deficiencies in objective test items. 

Guidelines for Constnjr;i3<g Essays 

1. Use essay items to assess comple ';:2.ning outcomes. Avoid 
starting essay questions with terms like . name, state, who, what, 
where, and when. You arc likely to end i , ^, iili written-response items 
when you do. To encourage higher fontxi of thinking, consider using 
such words or phrases as compare, a* gut for or against, spe ilate 
about the causes for, reorganize, hypa ru^size, take a position and de- 
fend it, and so on. 

2. Favor restricted-response essays. I» is generally advisable to con- 
struct essays that can be answered in about 15 minutes. Following 
this guideline will provide a broader sampling of student achievement. 
The scorer's task aiso will be more manageable, 

3. Structure the problem. Structure is provided when items specify 
what students are to do and the basis on which their answers will be 
judged. Additional information, such as how long students should 
work on an essay, is also recommended, a!> in the example of an 
extended- response essay item presented earlier. 

4 Prepare model answers before asking students to respond. Fol- 
lowing this guideline will help you decide which questions need to 
be altered or eliminated. 

5. Allow sufficient t'me to ansv,er. There is some tendency to ask 
too many essay questio.is in a single testing period. This can result 
in frantic efforts to write as much as possible, in the end, quality is 
sacrificed for quantit> If studenu> are to think and outline before they 
write, they must have time to do so. 

6. Encourage thoughtful answers. You might try this out: 

a) Prepare model answers ^guideline 4); then 

b) give students more time to answer than you needed, and 



c) let students know that thoughtful answers are expected by giv- 
ing them access to earlier model answers, requesting that they 
think and outline before writing, and scoring all essays care- 
fully - this includes writing positive and constructive nega- 
tive conunents on each essay paper. 

7. Require al> students to answer the same questioiis Letting stu- 
dents decide which of several questions to answer may be the popu- 
lar thing to do, but it is bad practice when instructional objectives 
are the sajne for all students. When each student answers a different 
set of questions, the basis for comparing their answers is weakened. 

Guidelines for Scoring Essays 

Inconsistency in scoring essays can be diminished by following these 
guidelines. 

1 . Use model answers. Evaluate student answers by comparing them 
to the model answers prepared before the test. You may need to ad- 
just model answers slightly to accomnnxlate student response patterns. 
This is the most important scoring guideline. 

2. Score uhe same question on all papers before going on to the 
next question. Focusing on one question at a time across papers will 
help you compare answers. 

3. Cover student names. If possible, keep from identifying the es- 
say wnter. Following this guideline will reduce the likelihood of biased 
scoring. 

4. Read each essay twice before scoring. Better still, ask a cc!'^ugue 
to spot-check your scoring. Putting this guideline into efft.. may be 
close t impossible for busy teachers. Nevertheless, reading ess4;« 
more than once is likely to increase scoring accuracy 

Perhaps enough has been said about the do*s and don\s of iteT!- 
wrr.ing The next chapter provides practice in applying :,ome of the 
suggestions. 



Some Practice 



The following items violate one or more of the guidelines presented 
in the last chapter. A few items may be acceptable. To provide clari- 
ty, the items are written at the knowledge or comprehension level. 
See if you can spot the problems with these items. 

True-False Items 

1 The most important aspect of essay questions is that they pro- 
vide relative freedom of response. 
2. Mozart's contribution to symphonic music has been extensive. 

3 Criminologists sometimes disagree about the use of punishment 
to diminish crime. 

Let*s see how you have done. Item 1 fails to identify the source 
of the opinion The word "extensive** \n Item 2 needs to be qualified; 
otherwise the item will remain ambiguous. Item 3 uses a specific de- 
te/mincr (sometimes) and is trivial. 

Completion Items 

4 Another (name ) for table of spe':l(icatlons is test blueprint. 

5 if a teacher wants to evaluate skills in selecting, organizing, and 
synthesizing, she should favor an (essay ) item. 

6. The Empire State Building is in (Nevy York ). 
O 



Item 4 suffers from two problems: the blank is in the bcgimiing 
rather than near U)e end of the question, vx4 the answer is unimpor- 
tant. It would be easy to rewrite the question so that the answer is 
table of specifications or test blueprint. Did you spot the grammati- 
cal giveaway in Item 5? The "aif before the blank eliminates all 
objective-type items considered in this festback! Item 6 is slightly open. 
What *i 'ne do with "mid-Manhattan?" It can be repaired by ad- 
ding n;id-iv.4nhattan to the key or adding "the city of* to tlie item. 

Multiple-Choice Items 

7. Which of the following qualify as objective test Items? 

a. True-false 

b. Multiple-choice 

c. Restricted-response essay 

d. a and b 

e. b and c 

8. ReliaDllity 

a. refers to consistency in measurement 

b. is a synonym for objectivity 

c. refers to the error portion of a score 

d. declines as the length of a test increases 

9. Morphemes consist of one or more 

a. clauses 

b. words 

c. phonemes 
± sentences 

Note the grammatical giveaway m Item 7 - the plural stem directs 
students away tfom oplii^is ''a,*' **b,*' and ^c.** Thus only two viable 
options remain. Item 8 illustrates a very conmion fault - the item 
stem fails to present a complete problem. Finally, what do you sup- 
pose an uninformed student would do with Item 9? Undoubtedly some 
would receive credit by merely associaimg the similar-looking words, 
morphemes and phonemes. 
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Essay Items 



10. Tell ail you know about the Treaty of Guadalupe Hidalgo. 

11. What Is content validity? 

12. Dircuss achievement testing. 

Item 10: ^flany yea. s ago l responded as a fiftb-grad^^" thb exact 
item. I answered correctly (I said I knew nothing abc. m treaty), 
but receivui no credit for my sincere effort. The teacher remained 
impervious to my complaints. This item, as defined in this fastback, 
is at best a written-rcsponse question. Needless to say, I think it should 
be stated a little differently. 

Like the previous question. Item 1 1 calls for knowledge or com- 
prehension. It, too, will probably funcuon as a written-responsc 
question. 

Item 12: Volumes have been written about achievement testing. 
What is to be discussed? Teacher-made tests? Siandardizcd achieve- 
ment zesting? Norms and their interpretation? Validity issues? Paper- 
and-pencil or other forms of achievement testing? If students .iEve 
20 or 60 minutes to answer, they must know what is expected aiKi 
how their answers will be evaluated. Lack of structure, as in the case 
of this item, is likely to provoke both a wide range of answers and 
subjectivity in scoring. 
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A Closmg Note 



Following the PLAN-WRITE system will help you prepare good, 
content-valid tests. But onc^ your tests have been administered, there 
is an additional step to be taken. You should do an item analysis. An 
item analysis provides information about item difficulty (items that 
are too difficult or too easy will huit test reliability) and i m dis- 
crimination (you may want to know how well items discriminate be- 
tween high-achieving and lov^ achieving students). Acting on the 
results of an item analysis can improve the general quality of a test. 
The references on the next page will help you locate simple item anal- 
ysis procedures. 

Finally , let me urge you to continue your studies. Whether you are 
concerned with item writing, item analysis, using tests for non-grading 
purposes, criterion versus nomi-rcferenccd testing, stm\dardized tests 
and their interpretation, or most other topics in educauonal evalua- 
tion, you will find the litcMture in the area weP <kvelopcd aiKl helpful. 
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Sources for Additional Information 
on Measurement and Evaluation 



The following two books arc highly recommended. Consult the first 
one if you want scope and depth. It includes a chapter on item analy* 
sis for classroom tests and r-xtcsded diauissions on standardized test- 
ing. The Ti»<'kman book is considerably shorter and written for the 
practitioner. Both touch on Benjamin Bloom's work. 

Hopkins, K.D; Stanley, I.C.; and Hopkia^, B.R. Educational and 
Psychological Measurement and Evaluation. 7th ed. Englewood 
Cliffs, N.J.: Prcnticc-Hall, 1990. 

Tuckman, B.W, Testing for Teachers, 2nd cd. San Diego: Harcourt 
Brace Jovanovich, 1988. 
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